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Abstract: The positive false discovery rate (pFDR) is a useful overall mea- 
sure of errors for multiple hypothesis testing, especially when the underlying 
goal is to attain one or more discoveries. Control of pFDR critically depends 
on how much evidence is available from data to distinguish between false 
and true nulls. Oftentimes, as many aspects of the data distributions are 
unknown, one may not be able to obtain strong enough evidence from the 
data for pFDR control. This raises the question as to how much data are 
needed to attain a target pFDR level. We study the asymptotics of the 
minimum number of observations per null for the pFDR control associated 
with multiple Studentized tests and F tests, especially when the differences 
between false nulls and true nulls are small. For Studentized tests, we con- 
sider tests on shifts or other parameters associated with normal and general 
distributions. For F tests, we also take into account the effect of the num- 
ber of covariates in linear regression. The results show that in determining 
the minimum sample size per null for pFDR control, higher order statistical 
properties of data are important, and the number of covariates is important 
in tests to detect regression effects. 
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1. Introduction 

A fundamental issue for multiple hypothesis testing is how to effectively control 
Type I errors, namely the errors of rejecting null hypotheses that are actually 
true. The False Discovery Rate (FDR) control has generated a lot of interest 
due to its more balanced trade-off between error rate control and power than 
the traditional Familywise Error Rate control (h|). For recent progress on FDR 



control and its generalizations, see (|6Hl2l Il4l - |l6l . 1 1 91 ) and references therein 



Let R be the number of rejected nulls and V the number of rejected true nulls. 
By definition, FDR = E[V/(R\/1)}. Therefore, in FDR control, the case R = is 
counted as "error-free" , which turns out to be important for the controllability 
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of the FDR. However, multiple testing procedures are often used in situations 
where one explicitly or implicitly aims to obtain a nonempty set of rejected 
nulls. To take into account this mind-set in multiple testing, it is appropriate to 
control the positive FDR (pFDR) as well, which is defined as E[V/R \ R > 0] 
(fljl). Clearly, when all the nulls are true, the pFDR is 1 and therefore cannot 
be controlled. This is a reason why the FDR is defined as it is . On the other 
hand, even when there is a positive proportion of nulls that are false, the pFDR 
can still be significantly greater than the FDR, such that when some nulls are 
indeed rejected, chance is that a large proportion or even almost all of them are 
falsely rejected (0, 0)- 

The gap between FDR and pFDR arises when the test statistics cannot pro- 
vide arbitrarily strong evidence against nulls (0). Such test statistics include t 
and F statistics (0). These two share a common feature, that is, they are used 
when the standard deviations of the normal distributions underlying the data 
are unknown. In reality, it is a rule rather than exception that data distributions 
are only known partially. This suggests that, when evaluating rejected nulls, it 
is necessary to realize that the FDR and pFDR can be quite different, especially 
when the former is low. 

In order to increase the evidence against nulls, a guiding principle is to in- 
crease the number of observations for each null, denoted n for the time being. In 
contrast to single hypothesis testing, for problems that involve a large number 
of nulls, even a small increase in n will result in a significant increase in the de- 
mand on resources. For this reason, the issue of sample size per null for multiple 
testing needs to be dealt with more carefully. It is known that FDR and other 
types of error rates decrease in the order of 0(y/\ogn/n) (13). In this work, we 
will consider the relationship between n and pFDR control, in particular, for the 
case where false nulls are hard to separate from true ones. The basic question to 
be considered is: in order to attain a certain level of pFDR, what is the minimum 
value for n. This question involves several issues. First, how does the complexity 
of the null distribution affect n? Second, is normal or t approximation appropri- 
ate in determining n? In other words, is it necessary to incorporate information 
on higher order moments of the data distribution? Third, what would be an at- 
tainable upper bound for the performance of a multiple testing procedure based 
on partial knowledge of the data distributions? 

In the rest of the section, we first set up the framework for our discussion, 
and then outline the other sections. 

1.1. Setup and basic approach 

Most of the discussions will be made under a random effects model Gil El- 
Each null Hi is associated with a distribution F$ and tested based on = 
. . . , X,- n ), where Xn, . . . , Xi„ are iid ~ Fi and the function £ is the same 
for all Hi. Let B L = l{Hi is true}. The random effects model assumes that 
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are independent, such that 



a u w( ^ c\a ) w ith density Pq , if { = 
9i ~ Bernoulli^), 6 0< ~ ■{ ... , („) . , (1.1) 

' with density p\ , it 0j = 1 



where 7r G [0, 1] is a fixed population proportion of false nulls among all the 
nulls. Note that P> n ' of depend on n, the number of observations for each null 
It follows that the minimum pFDR is (cf. Qj) 



I -IT p {n) 

a* = : , with p n := sup-r-y. (1.2) 

1 — 7T + TTpn ' 

In order to attain pFDR < a, there must be a* < a, which is equivalent to 
(1 — a)(l — n)/(air) < p n . For many tests, such as t and F tests, p n < oo and 
p n f oo as n — y oo. Then, the minimum sample size per null is 



rt* = mm 



{n: (l-a)(l-7r)/(o7r) < p„} . (1.3) 



In general, the smaller the difference between the distributions under false 
nulls and those under true nulls, the smaller p n become, and hence the larger 
n* has to be. Our interest is how should grow as the difference between the 
distributions tends to 0. 

Notation Because (1 — a)(l — ir)/(aw) regularly appears in our results, it will 
be denoted by Q a> „ from now on. 



1.2. Outlines of other sections 

Section [5] considers t tests for normal distributions. The nulls are Hi : p t = 
for N(p,i,ai), with unknown. It will be shown that if /ij/(Tj = r for false 
nulls, then, as r J, 0, the minimum sample size per null ~ iXl r ) In Q a , tt and 
therefore it depends on at least 3 factors: 1) the target pFDR control level, a, 
2) the proportion of false nulls among the nulls, 7r, 3) and the distributional 
properties of the data, as reflected by \Xijoi. In contrast, for FDR control, there 
is no constraint on the sample size per null. The case where ^ij/uj associated 
with false nulls are sampled from a distribution will be considered as well. This 
section also illustrates the basic technique used throughout the article. 

Section [3] considers F tests. The nulls are Hi : /3,- = for Y = flfX + e, 
where X consists of p covariates and e ~ N(0, cr,-) is independent of X. Each 
Hi is tested with the F statistic of a sample (Y ik ,X k ), k = 1, . . . , n + p, where 
n > 1 and Xi, . . . , X n+p consist a fixed design for the nulls. Note that n now 
stands for the difference between the sample size per null and the number of 
covariates included in the regression. The asymptotics of n*, the minimum value 
for n in order to attain a given pFDR level, will be considered as the regression 
effects become increasingly weak and/or as p increases. It will be seen that 
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must stay positive. The weaker the regression effects are, the iarger has to 
be. Under certain conditions, should increase at least as fast as p. 

Section [4] considers t tests for arbitrary distributions. We consider the case 
where estimates of means and variances are derived from separate samples, 
which allows detailed analysis with currently available tools, in particular, uni- 
form exact large deviations principle (LDP) Q). It will be shown that the mini- 
mum sample size per null depends on the cumulant generating functions of the 
distributions, and thus on their higher order moments. The asymptotic results 
will be illustrated with examples of uniform distributions and Gamma distribu- 
tions. An example of normal distributions will also be given to show that the 
results are consistent with those in Section [2] We will also consider how to split 
the random samples for the estimation of mean and the estimation of variance 
in order to minimize the sample size per null. 

Section [5] considers tests based on partial information on the data distribu- 
tions. The study is part of an effort to address the following question: when 
knowledge about data distributions is incomplete and hence Studentized tests 
are used, what would be the attainable minimum sample size per null. Under 
the condition that the actual distributions belong to a parametric family which 
is unknown to the data analyzer, a Studentized likelihood test will be studied. 
We conjecture that the Studentized likelihood test attains the minimum sam- 
ple size per null. Examples of normal distributions, Cauchy distributions, and 
Gamma distributions will be given. 

Section [S] concludes the article with a brief summary. Most of the mathemat- 
ical details are collected in the Appendix. 

2. Multiple f-tests for normal distributions 
2.1. Main results 

Suppose we wish to conduct hypothesis tests for a large number of normal dis- 
tributions N(ni,Gi). However, neither a% nor any possible relationships among 
(/Zj, <7j), i > 1, are known. Under this circumstance, in order to test Hi : = 
simultaneously for all N(fii, ai), an appropriate approach is to use the t statistics 
of iid samples Y a , . . . , Yj, n+ i — N(fii, <7j): 



Suppose the sample size n + 1 is the same for all Hi and the samples from 
different normal distributions arc independent of each other. 

Under the random effects model , we first consider a case where distribu- 
tions with Hi share a common characteristic, i.e., signal- noise ratio defined 
in the remark following Theorem 12. 11 

Theorem 2.1. Under the above condition, suppose that, unknown to the data 
analyzer, when Hi is false, fii/ai = r > 0, where r is a constant independent 




(2.1) 
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of i. Given < a < 1, let be the minimum value of n in order to attain 
pFDR < a. Then ~ (l/ r ) ^ n Qa,TT o,s r — ► 0+. 

Remark. We will refer to r as the signal-noise ratio (SNR) of the multiple 
testing problem in Theorem 12. II 

Theorem l2.1l can be generalized to the case where the SNR follows a distribu- 
tion. To specify how the SNR becomes increasingly small, we introduce a "scale" 
parameter s > and parameterize the SNR distribution as G s (r) = G(sr), 
where G is a fixed distribution. 

Corollary 2.1. Suppose that when Hi : /x$ = is false, Vi — IAi/o~i is a ran- 
dom sample from G(sr), where G(r) is a distribution function with support 
on (0, oo) and is unknown to the data analyzer. Suppose there is A > 0, such 
that J e Xr G(dr) < oo. Let Lq be the Laplace transform of G, i.e., Lg(X) — 
J e Xr G(dr). Then n* ~ (1/ ' s)L^ G 1 (Q a . 7T ) as s -> 0. 



2. 2. Preliminaries 



Recall that, for the t statistic (|2.1|) . if [i = 0, then T ~ t n , the t distribution 
with n degrees of freedom (dfs). On the other hand, if /i > 0, then T ~ t n ,s, 
the noncentral t distribution with n dfs and (noncentrality) parameter d = 
\Jn + 1/i/cr, with density 

t n ,s( x ) z 



n n/2 e -5*/2 



nT{n/2) (n + a;2)(n+i)/2 

fc/2 



fc=0 

Apparently t n ${x) = t n (x). Denote 













V n + ar / 



_ -<5 2 /2\^ Qn,fc(^) fc / 2 



Then 



i„(a;) ^— ' A;! V n + x- 

71 y ' fc=0 v 

It can be shown that t n ^{x) /t n {x) is strictly increasing in x and 

(cf. (Q)). Since the supremum of likelihood ratio only depends on n and r = fi/cr, 
it will be denoted by L(n, r) henceforth. 
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2. 3. Proofs of the main results 

We need two lemmas. They will be proved in the Appendix. The proofs of the 
main results are rather straightforward. The proofs are given in order illustrate 
the basic argument, which is used for the other results of the article as well. 

Lemma 2.1. 1) For any fixed n, L(n,r) — > 1, as r — > 0. 2) Given a > 0, if 
(n, r) — > (oo, 0) such that nr — > a, then L(n, r) -> e°. 3) If (n, r) — ► (oo, 0) with 
nr — ► oo, then L(n,r) — ► oo. 

Lemma 2.2. Under the same conditions as in Corollaru \2.1[ as (n, s) — > (oo, 0) 
such that ns — > a > 0, J L(n, sr) G(dr) — > Lo{a). 

Proof of Theorem PHI By (|1.2p . in order to get pFDR < a, 

1 - TT 



1 — tt + ttL(h, r) 



< a, or L(n, r) > Q a ^. 



Let be the minimum value of n in order for the inequality to hold. Then by 
Lemma HOI as r = u/a — > 0, n*r — > lnQ Q7r , implying Theorem 12. 11 □ 



Proof of Corollary \2.1\ Following the argument for (|1.2p , it is seen that under 
the conditions of the corollary, the minimum attainable pFDR is 

1 - TT 



1 — tt + tt J L(n, sr) G(dr) 
Then the corollary follows from a similar argument for Theorem (|2.1|) . □ 



3. Multiple f-tests for linear regression with errors being normally 
distributed 

3.1. Main results 

Suppose we wish to test Hi : (3 i = simultaneously for a large number of joint 
distributions of Y and X, such that under each distribution, Y = (3j X + a, 
where /3j £ BP are vectors of linear coefficients and e$ ~ N(0, ai) are independent 
of X. Suppose neither at or any possible relationships among <7j are known. 
Under this condition, consider the following tests based on a fixed design. Let 
Xk, k> 1, be fixed vectors of covariates. Let n + p be the sample size per null. 
For each i, let (Yn,Xi), (Yi <n + p , X n+p ) be an independent sample from 
Y = /3 i X + e. Assume that the samples for different Hi are independent of 
each other. 

Suppose that, unknown to the data analyzer, for all the false nulls Hi, 

+ + ^ fc = lj2> ^ {31) 
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where S > 0. This situation arises when all Xk are within a bounded domain, 
either because only regression within the domain is of interest, or because only 
covariates within the domain are observable or experimentally controllable. 

Note that n is not the sample size per null. Instead, it is the difference between 
the sample size per null and the number of covariates in each regression equation. 
Given a £ (0, 1), let 

rt* = inf {n : pFDR < a for F tests on Hi under the constraint (|3.1[) } . 



It can be seen that n* is attained when equality holds in (|3 . 1 1) for all the false 
nulls. The asymptotics of n„ will be considered for 3 cases: 1) S — > while p is 
fixed, 2) J — > and p — ► oo, and 3) p — ► oo while 5 is fixed. The case 5 — ^ is 
relevant when the regression effects are weak, and the case p — ► oo is relevant 
when a large number of covariates are incorporated. 

Theorem 3.1. Under the random effects model (II. 1|) and the above setup of 
multiple F tests, the following statements hold. 



a) If 5 — ► while p is fixed, then 



~ (l/5)M- 1 {Q 0l} „), with M p (t) := ^ 



r(p/2)(t 2 /4) fc 
k\T(k+p/2) 



k=0 



b) If 8 — > and p — > oo , 



(2/<5 2 )lnQ Q ^ ! 

(A/S 2 )lnQ a ^ 
1+ y/l + 8\nQ a ^/L 

c) Finally, if S > is fixed while p — > oo ; then 

21nQ a ,, r 
ln(l + <5 2 ) 



5 2 p- 


-0, 


5 2 p- 


■+ oo, 


S 2 P - 


■> L > 



3.2. Preliminaries and proofs 

Given data (Yi, -Xi), . . . , (Y~ n+P , _X"„ +P ), such that = [3 T Xi + e,, where -Yj 
are fixed and ti are iid ~ N(0,<j), if /3 = 0, then the F statistic of (Yi,Xi) 
follows the F distribution with (p, n) dfs. On the other hand, if (3 ^ 0, the F 
statistic follows the noncentral F distribution with (p, n) dfs and (noncentrality) 
parameter A, where 

_ (/3f XQ 2 + ■ ■ ■ + (/3f X w+P ) 2 
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The density of the noncentral F distribution is 

/p,„,a(z) = e- A ' 2 e^ 2 xP' 2 - l {l + 9x)-^/ 2 
(A/2) fe / Ox 



x V x> a 
f^ Q k\B(p/2 + k, n/2) \l + 9xj ' 

where 9 = p/n, and B(a,b) = T(a)T(b)/T(a + b) is the Beta function. Note 
fp,n,o{%) — fp.n(x), the density of the usual F distribution with (p,n) dfs. 
Denote 

B(p/2, n/2) ( n + p + 2j 

p , n ,k - , , , , s - 11 



B(p/2 + fc, n/2) 1 = 1 V P + 2j 



Then for x > 0, 



fp,n,/\{ x ) _ g — A/2 &p,n,fc(A/2) fe ^ 02; 



which is strictly increasing, and 

sup WW = Hm /p,n,AW = e -A/2 y V».fc(A/2) fc < ^ ^ 
x>0 Jp,n\ x ) x^co fp tn (X) £^ fc! 

First, it is easy to see that the following statement is true. 

Lemma 3.1. The expression in (|3.3[) is strictly increasing in A > 0. 

It follows that, under the constraint (|3.ip . the supremum of the likelihood 
ratio is attained when A = (n + p)S 2 and is equal to 

K(p, n, 5) = e -(»+^ 2 / 2 g Wtfr + P)* 2 /^ . 

Therefore, under the random effects model pFDR < a is equivalent to 

K(p,n,6) > Q a ,TT- Theorem 13.11 then follows from the lemmas below and an 
argument as to that of Theorem 12.11 The proof of Theorem 13.11 is omitted for 
brevity. The proofs of the lemmas are given in the Appendix. 

Lemma 3.2. Fix p > 1. If 5 — ► and n = n(5) such that n5 — > a 6 [0,oo) 7 
then K(jp,n,8) — > M p (a). If nS — > oo, then K(p,nS) — ► oo. 

Lemma 3.3. Let S — > and p — > oo. If n = n(S,p) such that 

n(n + p)5 2 

K 1 - a > 0, 3.4 
2p 

i/ien if (p, n, <5) — > e a . /n particular, given a > 0, (|3.4|) /io/rfs «/ 

2a/(5 2 if 8 2 p — > oo, 
40/(52 tf*»p-L>0. 

i + VT+WI 
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Lemma 3.4. Fix 5 > 0. Then for any n > 1, K(n,p,S) — ► (1 + <5 2 )™/ 2 as 
p —>■ oo. 

4. Multiple i-tests: a general case 
4-1- Setup 

Suppose we wish to conduct hypothesis tests for a large number of distributions 
Ri in order to identify those with nonzero mean /Zj. The tests will be based on 
random samples from Fi. Assume that no information on the forms of Fi or 
their relationships is available. As a result, samples from different Fi cannot 
be combined to improve the inference. As in the case of testing mean values 
for normal distributions, to test Hi : /i, = simultaneously, an appropriate 
approach is to use the t statistics Ti = y/njj,i/&i, where both /I; and of are 
derived solely from the sample from Fi, and n is the number of observations 
used to get /ti. 

Again, the goal is to find the minimum sample size per null in order to 
attain a given pFDR level, in particular when Fi under false Hi only have 
small differences from those under true Hi. The results will also answer the 
following question: are normal or t approximations appropriate for the t statistics 
in determining the minimum sample size per null? 

We only consider the case where \j,i is either or /io ^ 0, where (1q is a 
constant. In order to make the analysis tractable, the problem needs to be 
formulated carefully. First, unlike the case of normal distributions, in general, 
if /<j and of are the mean and variance of the same random sample, they are 
dependent and of cannot be expressed as the sum of iid random variables. As 
seen below, the analysis on the minimum sample size per null requires detailed 
asymptotics of the t statistics, in particular, the so called exact LDP (0, d)- 
For Studentized statistics, there are LDP techniques available (fl7h . However, 
currently, exact LDP techniques cannot handle complex statistical dependency 
very well. To get around this technical difficulty, we consider the following t 
statistics. Suppose the samples from different Fi are independent of each other, 
and contain the same number of iid observations. Divide the sample from Fi 
into two parts, {X a , ... , X in ) and {Y il: Y i2 , . . . , Y h2m }- Let 

T t = ^, with Ai = -y> <fe , ^ = ^-T t (Yi,2k-i-Y i , 2k ) 2 . 
(Ti n — ' 2m * — ' 

fe=l fe=i 

Then /tj and of are independent, and of is the sum of iid random variables. 

Second, the minimum attainable pFDR depends on the supremum of the 
ratio of the actual density of Tj and its theoretical density under Hi. In general, 
neither one is tractable analytically. To deal with this difficulty, observe that in 
the case of normal distributions, the supremum of the ratio equals 
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We therefore consider the pFDR under the rule that Hi is rejected if and only if 
Ti > x, where x > is a critical value. In order to identify false nulls as fio — > 0, 
x must increase, otherwise P(T > x\a = /j,q)/P(T > a; | fj, = 0) — * 1, giving 
pFDR — * 1. The question is how fast x should increase. 

Recall Section [21 Some analysis on (12. 2|) and (|2.3|) shows that, for normal dis- 
tributions, the suprcmum of the likelihood ratio can be obtained asymptotically 
by letting x = c n y/n, where c n > is an arbitrary sequence converging to oo; 
specifically, given a > 0, as r J, and n ~ a/r, 

P(T>c n y^\^/<J^r)/P(T> c nV ^|/x = 0) ^ i 
sup a; i„ !rV ^(a;)/t n (x) 

If, instead, x increases in the same order as ypn or more slowly, the above limit 
is strictly less than 1. Based on this observation, for the general case, we set 
x = c n \/n, with c n — > oo. In general, there is no guarantee that using c n growing 
at a specific rate can always yield convergence. Thus, we require that c n grow 
slowly. 

Under the setup, suppose that, unknown to the data analyzer, when Hi : 
Hi = is true, Fi(x) = F(six), and when Hi is false, Fi(x) = F(siX — d), where 
Si > and d > 0, and F is an unknown distribution such that 

F has a density /, EX = 0, a 2 := EX 2 < oo, for X ~ F, (4.1) 

The sample from Fi consists of (Xy — d)/si, 1 < j < n, and (Yik — d)/si, 
1 < k < 2m, with Xij, Yik iid ~ i 71 . Then the t statistic for Hi is 



J y/nXin/ S in if -ffi is true, 

I v/n^in + d)/ S in if ffi is false, 

where X in = — — — , S 2 n = — y^(Y it2 k-i - Yi,ikf 



2m 

fe=X 



Let N = n + m and zat = c n . Then Hi is rejected if and only if Ti > z^ypn. 
Under the random effects model (|1.1|) . the minimum attainable pFDR is 



a* = (1 — 7r) 



_P(X n + d>zjyS , n 

P (X n > Z N S m ) 



(4.2) 



where X n = YJl=i x k/n, and S m = Y^iO^k-i - ^2fe) 2 /(2m), with X i: Yj iid 
~ F. The question now is the following: 

• Given a € (0, 1), as d — > 0, how should iV increase so that a* < a? 
4-2. Main results 

By the Law of Large Numbers, as n — * oo and m — ► oo, X n — * and S m —* a 
w.p. 1. On the other hand, by our selection, zn — * oo. In order to analyze (|4. 2[) 
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as d — > 0, we shall rely on exact LDP, which depends on the properties of the 
cumulant generating functions 



A(t) = lnEe tx , $(t)=hiE 



exp 



t(X - Y) 



X, Y iid - F. (4.3) 



The density of X — Y is g(t) = J f(x)f(x + t) dx. It is easy to see that 
g(t) = g(—t) for t > 0. Recall that a function £ is said to be slowly varying at 
ao, if for all t > 0, lim^oo C,{tx)/(,(x) = 1. 

Theorem 4.1. Suppose the following two conditions are satisfied. 

a) S 2?^ and A(t) — > oo as i f sup2?A ; where T>\ = {t : A(t) < oo}. 

6^ T/ie density function g is continuous and bounded on (e, oo) /or any e > 0, 
and there exist a constant A > — 1 and a function £(z) > which is increasing 
in z > and slowly varying at oo, suc/i that 

lim - = Ce (0,oo). (4.4) 

Fix a € (0, 1). Lei TV* be the minimum value for N = in + n in order to 
attain a* < a, where a* is as in (14. 2p . Then, under the constraints 1) m and n 
grow in proportion to each other such that m/N — ► p 6 (0, 1) as m, n — * oo and 
2) zpj ^ co slowly enough, one gets 

1 lnQ„ T 

^* ~ -, x t; n , as d 0+, (4.5) 

d (1 - p)t 

where to > is the unique positive solution to 

tA'(t) = (1 + A)P . (4.6) 
1 -P 

Remark. (1) By (|4.5| and (|4.6p . A^* depends on the moments of F of all 
orders. Thus, t or normal approximations of the distribution of T in general are 
not suitable in determining in order to attain a target pFDR level. 

(2) If zjy — * oo slowly enough such that f|4.5|) holds, then for any — * oo 
more slowly, (|4.5|) holds as well. Presumable, there is an upper bound for the 
growth rate of in order for (|4.5j) to hold. However, it is not available with 
the technique employed by this work. 

(3) We define N as n + m instead of n + 2m because in the estimator S m , 
each pair of observations only generate one independent summand. The sum 
n + m can be thought of as the number of degrees of freedom that are effectively 
utilized by T. 

Following the proof for the case of normal distributions, Theorem 14.11 is a 
consequence of the following result. 
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Proposition 4.1. Let T > 0. Under the same conditions as in Theorem \4-l[ 
suppose d = djv — * 0, such that dfjN —> T > 0. Then 

P (X n +d N > z N S m ) _^ e (i- p )Tt u 7 j 

P (X n > z N S 



Indeed, by display (|472|) and Proposition [4TJ if dN — > T > 0, then the 
minimum attainable pFDR has convergence 

1 -7T 



1 — 7T + 7re( 1 ~'') Tt ° ' 



(4.8) 



In order to attain pFDR < a, there must be a* < a, leading to (|4.5p . The proof 
of Proposition 14.11 is given in the Appendix IA31 



4-3. Examples 

Example 4.1 (Normal distribution). Under the setup in Section I4TT1 let F — 
N(0,a) in (|4T]) . By A(t) = \nE{e tx ) = a 2 t 2 /2, condition a) of Theorem ED is 
satisfied. For X, Y iid ~ F, X - Y ~ iV(0, v^cr). Therefore, <gH) is sa tisfied 
with A = and C,(x) = 1. The solution to gSJ is i Q = (l/a)y/p/(l - p). Then 
by Theorem [HU 

fj In Q„ _ 

~ — x — , asd^0 + . (4.9) 

To see the connection to Theorem 12.11 observe X n = aZj^Jn and S m — 
aW m /y/rn, where Z ~ N(0, 1) and ~ Xm are independent. Since z^v f oo 
slowly, so is a m := y/n/mzN- Let r rn — (d/a)y/n/(m + 1). Then 



P(A„ + rf > z N S m ) _ PjZ + \M + 1 r m > a m W m ) 
P(A„ > zjv^ m ) ~ P(Z > a m W m ) 

_ \ ~ ^m,ym+lr m ( a m) 

1 - T m (ojv) 

where T mj 5 denotes the cumulative distribution function (cdf ) of the noncentral 
£ distribution with m dfs and parameter (5, and T m the cdf of the t distribution 
with m dfs. Comparing the ratio in (|2.2[) and the above ratio, it is seen that the 
difference between the two is that probabilities densities in (|2.2| are replaced 



with tail probabilities. Since r m = (d/a)y/n/(m + 1) ~ {d/a)^f (1 — p)/p, by 
Theorem 12. 11 in order to attain pFDR < a based on (|2.2|) . the minimum value 
to* for to satisfies m* ~ {a/d)^J p/{\ — p) lnQ Qj7r . Since m*/N* — > p, the 
asymptotic of TV, given by Theorem 12.11 is identical to that given by Theorem 
1441 



Example 4.2 (Uniform distributions). Under the setup in Section |4~T1 let F 
[/(-§, |) in fO]) . Then for f > 0, 

A(t) = 4 + m (e *-l)-lnt, ^) = ^-l. 
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and for t < 0, A(t) = A(— i). Thus condition a) in Theorem 14.11 is satisfied. It 
is easy to see that condition b) is satisfied as well, with A = and £(x) = 1 in 
03). Then by g3J, 



Example 4.3 (Gamma distribution). Under the setup in SectionQTTJ let F be 
the distribution of £ — a(3, where £ ~ gamma(a, /3) with density 
p-a x <*-i e -x/p/ T ( a y For < i < 1//3, 

A(t) = ln£[e*«- a «] = -aln(l - pt) - aj9i, tA'{t) = 

Therefore, condition a) in Theorem l4.1l is satisfied. Because the value of A in (|4.4[) 
is invariant to scaling, in order to verify condition b), without loss of generality, 
let (3 = 1. For x > 0, the density of X — F is then g(x) — e~ x k(x)/T(a) 2 , where 
fc(x) = / °° + x) a ~ 1 e~ 2u du. It suffices to consider the behavior of k(x) 

as a; J, 0. We need to analyze 3 cases. 

Case 1: a > 1/2 As x J. 0, k(x) -> u 2 ^ 1 er 2u du < oo. Therefore, (|Q| 
holds with A = and £ = 1. 

Case 2: a = 1/2 As x | 0, fe(x) -> oo. We show that ijO]) still holds 
with A = 0, but C(z) = lnz. To establish this, for any e > 0, let k e (x) = 
J^u- 1 / 2 (u + x)- 1 / 2 du. Then 

i<mM<^M<^- 

xL0 k e (x) xio k e {x) 
By variable substitution u = xv 2 , 



/x dt 

k e (x)=2l = (l + o(l))ln(l/x), asxjO. 

Vt + 1 



As a result, 



1 < hm , , - < hm , , / - < e z 



cl0 ln(l/x) xio ln(l/x) 
Since e is arbitrary, (|4.4|) is satisfied with A = and £(#) = lnz. 

Case 3: a < 1/2 As x J, 0, k(x) — ► oo. Similar to the case a = —1/2, it suffices 
to consider the behavior of k e (x) — Jl e u q_1 (m + x)" -1 rfu as x j 0, where e > 
is arbitrary. By variable substitution u = ix, 

k e (x) = t 2a - x [ ' t a -\t + I)"" 1 dt = (1 + o(l))C Q i 2a - 1 , as x | 0, 
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where C Q = f^°t a ~ l (t + l)" -1 dt < oo. Therefore, (03]) is satisfied with A = 
2a — 1 and £(z) = 1. 

From the above analysis and (|4.4p . TV* ~ (l/c?)(lnQ ct>w )/[(l — p)to], where 



y 7 ^ + 27 - 7 i _g__ (4U) 

/3 ' ' IV (2a) 1-p V ; 



4-4- Optimal split of sample 

For the i statistics considered so far, m/N is the fraction of degrees of freedom 
allocated for the estimation of variance. By (|4.5|) . the asymptotic of AT* depends 
on the fraction in a nontrivial way. It is of interest to optimize the fraction in 
order to minimize AT*. Asymptotically, this is equivalent to maximizing (1 —p)to 
as a function of p, with to = t (p) > as in (|4.6p . 



Example 14.11 (Continued) 

By (|4.9[) . it is apparent that the optimal value of p is 1/2. In other words, 
in order to minimize AT*, there should be equal number of degrees of freedom 
allocated for the estimation of mean and the estimation of variance for each 
normal distribution. In particular, if m = n — 1, then p = 1/2, and the resulting 
t statistic has the same distribution as \fn — lZ/W n -x, where Z ~ AT(0, 1) and 
W n —i ~ Xn-i arc independent, which is the usual t statistic of an iid sample of 
size n. 



Example 14.21 (Continued) 

By (|4.10[) . the larger tanhio is, the smaller A^* becomes. The function tanhio 
is strictly increasing in to, and tanhto — * 1 as to — > oo. By p = 1 — 2tanhio/to, 
the closer p is to 1, the smaller A^. 



Example 14.31 (Continued) 

Denote 9 = 1/[1 V (2a)]. By (|4.11[) . we need to find p to maximize 



(1-p) + 2 7 - l\ = V6 2 P 2 + 2dp(l -p)- Op. 

By some calculation, the value of p that maximizes the above quantity is 

1 1 



2 + VW 2+ y/2 A (1/a)' 

For < a < 1/2, the optimal fraction of degrees of freedom allocated for the 
estimation of the variance of gamma(a,/3) tends to 1/(2 + V2) as d — > 0. On 
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the other hand, as a — > oo, the optimal fraction tends to 1/2 as d — > 0, which is 
reasonable in light of Example 14. II To see this, let /3 = 1. For integer valued a 
and £ ~ gamma(a, 1), £ — a can be regarded as the sum of Wi — 1, i = 1, . . . , a, 
with Wi iid following gamma(l, 1). Therefore, for a ^> 1, £ — a follows closely 
a normal distribution with mean 0. Thus by Example 14. 1[ the optimal value of 
m/(n + m) is close to 1/2. 

5. Multiple tests based on likelihoods 
5.1. Motivation 

In many cases of multiple testing, only limited knowledge is available on the 
distributions from which data are sampled. The knowledge relevant to a null 
hypothesis is expressed by a statistic M such that the null is rejected if and 
only if the observed value of M is significantly different from 0. In general, as 
the distribution of M is unknown, M has to be Studentized so that its magnitude 
can be evaluated. 

On the other hand, oftentimes, despite the complexity of the data distri- 
butions, it is reasonable to believe they have an underlying structure. Consider 
the scenario where all the data distributions belong to a parametric family {pe}, 
such that the distribution under a true null is po, and the one under a false null 
is ps m for some 9* ^ 0. A question of interest is: under this circumstance, what 
would be the optimal overall performance of the multiple tests? The question is 
in the same spirit as questions regarding estimation efficiency. However, it as- 
sumes that neither the existence of the parameterization nor its form is known 
to the data analyzer and all the machinery available is the test statistic M. 

As before, we wish to find out the minimum sample size per null required 
for pFDR control, in particular, as the tests become increasingly harder in the 
sense that 0* — ► 0. Our conjecture is that, asymptotically the minimum sample 
size per null is attained if M "happens" to be d[lnpo]/d6. By "happens" we 
mean that the data analyzer is unaware of this peculiar nature of M and uses 
its Studentized version for the tests. This conjecture is directly motivated by 
the fact that the MLE is efficient under regular conditions. Although a smaller 
minimum sample size per null could be possible if M happens to be the MLE, 
due to Studentization, the improvement appears to diminish as 9 — > 0. Certainly, 
had the parameterization been known, the (original) MLE would be preferred. 
The goal here is not to establish any sort of superiority of Studentized MLE, 
but rather to search for the optimal overall performance of multiple tests, when 
we are aware that our knowledge about the data distributions is incomplete and 
beyond the test statistic, we have no other information. 

The above conjecture is not yet proved or disproved. However, as a first step, 
we would like to obtain the asymptotics of the minimum sample size per null 
when Studentized d[\np ]/d9 is used for multiple tests. We shall also provide 
some examples to support the conjecture. 



Z. Chi/ 'Sample size and pFDR 



02 



5.2. Setup 

Let (fi,.F) be a measurable space equipped with a cr-finite measure p. Let 
{pg : 9 G [0, 1]} be a parametric family of density functions on with re- 

spect to \i. Denote by Pg the corresponding probability measure. Under the 
random effects model (|l.ip . each null Hi is associated with a distribution Fi, 
such that when Hi is true, Fi = Po, and when Pj is false, Fi = Pg, where 9 > 
is a constant. Assume that each Hi is tested based on an iid sample {wy} from 
such that the samples for different Hi are independent, and the sample size 
is the same for all Hi. 

We need to assume some regularities for pg. Denote 

r$(u) = ^""4, = \npg(ui), u e fi. (5.1) 

Po(w) 

Condition 1 Under Po, for almost every cj 6 fi, po( w ) > and Pb(uj) as a 
function of 9 is in C* 2 ([0,1]). 

Condition 2 The Fisher information at 9 = is positive and finite, i.e. < 
||£o||l 2 (p ) < 00 1 where the "dot" notation denotes partial differentiation with 
respect of 9. 

Condition 3 Under Po, the second order derivative of £g(cu) is uniformly 
bounded in the sense that sup ee r 01 i pe(u;)||x,oo(f> ) < oo. 

Condition 4 For any q > 0, there is 9' = 9'(q) > 0, such that 



En 



sup {rg{uj) q +rg{io)- q ) 
6e[o,0'] 



< oo. (5.2) 



Remark. By Condition 1, for any interval / in [0, 1], the extrema of rg(uj) over 
9 € I are measurable. Thus the expectation in (|5.2p is well defined. 

For brevity, for 9 6 [0,1] and n > 1, the n-fold product measure of Pg is 
still denoted by Pg, and the expectation under the product measure by Eg. We 
shall denote by u), lu' , mi, uj^ generic iid elements under a generic distribution 
on (fi, T). Denote 

X = io(u), Y = £o(u>'), X z = io(ui), Y % = £ (w{). (5.3) 

For m, n > 1, denote 

C 2 _ 1 (jji-l ~ Y2j) 2 - _ X\ + ■ ■ ■ + X n 

b m — / , 7> > A »i — • 

m c — ' z n 

i=l 

Since £$(u>) = pg(uo) / pg(uo) , from Conditions 1-4 and dominated convergence, 
it follows that Eq£q — and 



(Ee£o)'\g =0 = -^ I £ (u)pe(u) u(du) 



1=0 



(£ (lo)) 2 Po (lu) n(du) > 0. 
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As a result, for 9 > close to 0, Eg£o(u>) > 0. This justifies using the upper tail 
of \fnX n j S m for testing. The multiple tests are such that 



Hi is rejected 



y/nX« 



(5.4) 



where and Si m are computed the same way as X n and S m , except that they 
are derived from uin , . . . , w;„, (J ix ,.. .,u>' i2m iid ~ Fi, N = n + m, and zn — > oo 
as A — > oo. Then, under the random effects model, the minimum attainable 
pFDR is 



n,,, 



(1-tt) 



1 — 7T + 7T- 



(X„ > znS„ 



Po 

The question now is the following: 

• Given a £ (0, 1), as J. 0, how should N increase so that a* < a? 



(5.5) 



t(X-Y) 2 



(5.6) 



5.5. Main results 

Denote the cumulant generating functions 

A(i) = lnE (e tx ), V(t)=lnE exp ■ 
Note that the expectation is taken under Po- 

Theorem 5.1. Suppose {po : 9 € [0, 1]} satisfies conditions 1~4 and the follow- 
ing conditions a)-d) are fulfilled.. 

a) eV%, where V A = {t : A(t) < oo}. 

b) Under P , X has a density f continuous almost everywhere on K. Further- 
more, either (i) f is bounded or (ii) f is symmetric and \\ X\\ L oo(p ^ < oo. 

c) Under Pq, the density g of X — Y is continuous and bounded on (e,oo) for 
any e > 0, and there exist a constant A > —1 and a function £(z) > increasing 
in z > and slowly varying at oo, such that 



inn , g ff ; = Ce (0,oo). 



(5.7) 



uj.0 u A £(l/u) 

d) There are s > and L > 0, suc/i i/iai 

P [e s|X+Y| |X - F = u] < Le*'"', any w^O, g(u) > 0. (5.8) 

Pw; a £E (0, 1). Lei be the minimum value of N = n + m in order to attain 
a* < a, where a* is as in (15. 5| . Then, under the constraints 1) m and n grow 
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in proportion to each other such that m/N — » p G (0, 1) as m, n — > oo and ,2J 
z_/v - * oo slowly enough, one gets 

N*~-x kQa.* as ^ + . (5.9) 

d (1 - /o)A'(to) + 2pif) V ; 



where to is the unique positive solution to (|4.6p . and 

I f zha(z) dz if f is bounded, with ho — f 2 I f f 2 , 
Kf = < 

10 «/ / is symmetric and \\X \\l°°(P ) < °°- 



Remark. By symmetry, to verify (|5.8j) . it is enough to only consider u > 0. 
Moreover, (15. 8|) holds if its left hand side is a bounded function of u. 

Following the proofs of the previous results, Theorem l5.1l is a consequence of 
Proposition [5TTJ which will be proved in Appendix I A41 



Proposition 5.1. Let T > 0. Under the same conditions as in Theorem \5.1{ 

suppose 8 = 8^^0, such that 8^N — > T . Then 

Pe N (Xu>z N S m ) ^ exp{(1 _ p)TA%) + 2pT ^ /} _ (51Q) 



Po (X n > z?fS„ 



5-4- Examples 

Example 5.1 (Normal distributions). Under the setup in Section [5721 suppose 
for 8 e [0,1], Po = N(8,a), where a > is a fixed constant. Then pg(u) = 
exp[-(u - 6») 2 /(2cr 2 )]/V27TCT 2 , giving 

( 28u-8 2 \ (u-8) 2 ln(27ra 2 ) 

rg(u) = exp , t e {u) - 



\ a 2 J 1 ° y ' 2a 2 2 ' 

lg{u) = - , £<?(«) = — ^. 

For a; ~ Po, ^o(w) = w/cr 2 ~ N(0,l/a). It is then not hard to see that 
Conditions 1-4 are satisfied. By the notations in (|5.3| . AT, Y, A^, are iid 
- N(0, l/cr). Then A(i) = i 2 /(2cr 2 ) and condition a) of Theorem O is satisfied. 
It is easy to see that conditions b) and c) are satisfied with A = and £ = 1 
in (|5.7p . Since X + Y and A" — Y" are independent and the moment generating 
function of \X + Y| ~ \/2|Ar| is finite on the entire M, ()5.8p is satisfied as well. 
Therefore, IpTTUjl holds. Therefore, QEJ^ holds for AT*. 

To get the asymptotic in (|5.9p explicitly, note that the density / of X is po. 
Then it is not hard to see Kf — 0. On the other hand, since A'(t) = t/a 2 , 
the solution to > to tA'(t) = ^/ p/(l — p) equals ayfpJJ\—~p) and hence 
A'(t ) = (l/a)y/p/(l-p). Thus, AT* ~ {o/d)(hiQ a ^/y/p\l- p)), which is 
identical to (14.911 for the i tests. 
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Example 5.2 (Cauchy distribution). Under the setup in Section I5T21 suppose 
for 9 6 [0, 1], Pg is the Cauchy distribution centered at 9 such that its density 
is p e (u) = 7r _1 [l + (u - Of]- 1 , weR. Then 

1 + u 2 

r «(") = 1 + ( u - fl)2 ' £t> ^ = ~ + (M _ ^ _ ln7F ' 

• 2(» - g) • 2u 

1 + (u — 6') / 1 + 

By the notations in (|5.3[) . X = 2u/(l + lu 2 ), with ~ Po- Recall that Po is 
the distribution of tan(£/2) with £ ~ ?7(— tt, 7r). Therefore, X ~ sin£ and thus 
is bounded and has a symmetric distribution. It is clear that conditions a), b), 
and d) of Theorem 15. II are satisfied. We show that condition c) is satisfied with 
A = and C(z) = lnz in The density / of X is 1/[tt^/1 - u 2 ], u G [-1, 1]. 

Then Kf = and the density of X — Y is 

3 (w) = fc(w)/^ 2 , with k{u)= / u 6(0,1). 

j -i -t 2 )[i - (t + uy\ 

Given e S (0, 1 — u/2), write the integral as the sum of integrals over [—1,-1 + 
e], [1 — u— e, 1 — it], and [— 1 + e, 1 — u — e]. By variable substitution 

dt r 1 - u ~ t dt 

k(u) =2 



o ^/(2-t)(2-i-u)f(t + u) i-i +e ^(l-t^l-^ + u) 

r dt 

^2 — , as u — > 0. 

Jo y/(2-t)(2-t-u)t(t + u) 

Because e > is arbitrary, it follows that k(u) ~ fci(u), where 

/, ( « ) - f ; ^ - 2 - dx ~ ln(V«). 

/o ■><> %/P r TT 

with the second equality due to variable substitution t = ux 2 . This shows that 
([STF]) holds with A = and C(z) = \nz. By |[5T9|) . iV* - (t /d) x (lnQ Q)7r /p), 
as <i — > 0, where £q the positive solution to toA'(to) — p/(l — p), with A(t) = 
hi.E[e tsin t]. 

Remark. Because the Cauchy distributions have infinite variance, t tests can- 
not be used to test the nulls. The example shows that even in this case, Stu- 
dentized (o(w) can still distinguish between true and false nulls. 

Example 5.3 (Gamma distribution). Under the setup in Section I5T21 suppose 
for 9 e [0, 1], Pg — gamma(l + 9, 1), whose density is p$(u) = u 9 e~ u /T(l + 9), 
u>0. Then 

u e 

£ g (u) = lnu - ^>(0 + 1), ig{u) = -^'(9 + 1), 
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where ip( z ) = -T'(^)/r(z) is the digamma function. Let c = 0(1). By the nota- 
tions in (|5.3[) . X and Y are iid ~ In a; — c, with oj ~ Pq. It follows that X has 
density /(a;) = e a+c po(e x+c ) = e x+c exp (— e x+c ), i£l, which is bounded and 
continuous, and hence conditions b) and c) of Theorem 15.11 arc satisfied with 
A = 1 and ((z) = 1 in ([5T7]) . Since 



E Q \e 



tx ~x+c 



e e 



exp (-e x+c ) efe 



/■oo 

/ z*e c exp (— e c z) dz = 
Jo 



r(t + i) 



< oo, 



any t > — 1, 



condition a) is satisfied. To verify d), the density of X — Y at u > is 



2c+u+2x 



exp [-(l + e u )e' 



(substitute z = e 



c+x\ 



zexp \-{l + e u )z] dz = 



(1 + e M ) 2 ' 



Similarly, for s > 
k(s, u) :-- 



e s(2x+u) e 2c+u+2x exp r_(j + e «) e c+xj da . 



_ e (l+s)«-2sc 



As a result, for s < 1/2, 
Likewise, 



exp [-(1 + e u )z] dz = 



L(2 + 2s)e 



(l+s)u 



e 2sc (l + e u ) 



tA2+2s ' 



k(s.u) „, 
u] = = r 2 + 2s 



e 2c {l + e u f 



E [e- s{x+Y) | X - Y = u] = T(2 - 2s) 



e 2c (l + e") 2 



Since e s ' x+r ' < e s ( x+Y ' +e s (-' s: + y ) ) it is not hard to see that we can choose 
s = 1/2 and L > large enough, such that (|5.8p holds. 

By A(i) = lnr(f + 1) - 0(1)*, to > is the solution to t[tp(t + 1) - -0(1)] = 
p/(l-p).By// 2 =.g(0) = l/4, 

/OO /-OC 
zf{zf dz = A ze 2z+2c exp (-2e z+c ) dz, 
-oo J — oo 

which equals 0(2) - In 2 - 0(1). By 0(z) = (lnr(z))' and T(z + 1) = zT{z), 
ip(z + 1) - 0(z) = l/z. Therefore, K f = 1 - In 2. So by (515]), N« ~ (1/d) x 
lnQ„ )7r /[(l - p)A'(to) + 2p(l - In 2)]. 
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6. Summary 

Multiple testing is often used to identify subtle real signals (false nulls) from 
a large and relatively strong background of noise (true nulls). In order to have 
some assurance that there is a reasonable fraction of real signals among the 
signals "spotted" by a multiple testing procedure, it is useful to evaluate the 
pFDR of the procedure. Comparing to FDR control, pFDR control is more 
subtle and in general requires more data. In this article, we study the minimum 
number of observations per null in order to attain a target pFDR level and 
show that it depends on several factors: 1) the target pFDR control level, 2) 
the proportion of false nulls among the nulls being tested, 3) distributional 
properties of the data in addition to mean and variance, and 4) in the case of 
multiple F tests, the number of covariatcs included in the nulls. 

The results of the article indicate that, in determining how much data are 
needed for pFDR control, if there is little information about the data distribu- 
tions, then it may be useful to estimate the cumulant generating functions of 
the distributions. Alternatively, if one has good evidence about the parametric 
form of the data distributions but has little information on the values of the 
parameters, then it may be necessary to determine the number of observations 
per null based on the cumulant functions as well. In either case, typically it is 
insufficent to only use the means and variances of the distributions. 

The article only considers univariate test statistics, which allow detailed anal- 
ysis of tail probabilities. It is possible to test each null by more than one statis- 
tic. How to determine the number of observations per null for multivariate test 
statistics is yet to be addressed. 
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Appendix: Mathematical Proofs 
Al. Proofs for normal i-tests 



Proof of Lemma \2.1\ Part 1) is clear. To show 2), let (n,r) — > (oo,0) such that 
nr — ► a > 0. Since S = y/n + 1 r — ► 0, by (|2.3j) . it suffices to show 

^y^e 8 . (Al.l) 
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By Stirling's formula, T(x) = (z/e) z v / 2Tr/z [1 + 0(1/ z)}. Then for n > 1, 

O-n.k < 2 



n + fc + l\ ( " +fe+1)/2 /n + l^- ( " +1)/2 



2e / V 2e 



giving 



a?1 , fc (V2£) fc 2Q/2j)fe / n + fc + l \ fc/2 

m - »! i 2 ; (A12) 

_ 2[(n + l)(n + 1 + k)r 2 \ k l 2 3(w + r) fe (l + fc) fc / 2 

~ fc! " fc! ■ 

The right hand side has a finite sum over fc. By dominated convergence, 

lim L(n, r) = lim n,k(V~ ) 

(n,r)— ►(oo,0) f—' (n,r)— >(oo,0) fc! 

s.t. nr — >a fc— s _^ nr — >a 

= f> Um [(n+l)(n + l + fc)r 2 ] fc / 2 _ ~ a k _ ^ 

(n,r)-(oo,0) fc! ^ fc! 

K— u s.t. nr— »a K— u 

This yields 2). To show 3), by similar argument, given < c < 1, for n 3> 1, 
a„,fc(%/2£) fe > c(V26) k /^fl_\ fc/2 > c(nr) k 



fc! fc! V 2 / - fc! 

Therefore, as nr — > oo, i(n, r) > ce nr — > oo. □ 



Proof of Lemma 12.21 

By Stirling's formula, there is a constant C > 1, such that fc fe / 2 /fc! < C fc /r(fc/2+ 
1) for all fc 3> 1. Fix n so that C 2 a 2 /n < A and (|A1.2[) holds for all n > n - For 
fc > n (n + l), l + fc/(n + l) < k/n . Then applying (|A1.2|) with <5 = \Jn + 1 sr 
yields 



a„, fc (V2(5) fe 2[(n + l)(n + l + fc)s 2 r 2 ] fc / 2 2(fc/7i ) fc / 2 (nsr - 



fc! 



< 



2C* 



r(fc/2 + 1) 



fc! 

(ns + s) 2 r 2 
n 



fe/2 



< 



fc! 

2[6(s)r 2 ] fc / 2 

r([fc/2j + i)' 



sr) fe 



where 6(a) = C 2 (ns + s) 2 /n . Let A* € (C 2 a 2 /n , A). By / e Ar G(dr) < oo 



G(dr) < oo for any p > 0. Let (n, s) — > (oo, 0) such that ns 



Then 
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^ g n ^2{n + l)st) k ~ 2{b{ S )r 2 ) k ' 2 
^ W ~ ^ T{[k/2\ + 1) 



k—kn 



k—k( 



<2(l + ^Kr) 



(A,r 2 ) fc 

ifei 



fc=Lfc /2j 

By the above inequality and dominated convergence, 



lim / L(n, sr) G{dr) = J L(n, sr) G{dr) = / e ar G{dr) 

A2. Proofs for F-tests 
Proof of Lemma 13.11 

It suffices to show 4>'(t) > for t > 0, where 

bp,n,kt 



□ 



fc=0 



This follows from b p ^ n ^+i > &p,n,fc and 
Next, recall 



kl 



k=0 



[bp,n,k+l - b p .n 7 k]t k ^ , ( 



fc=0 



fc-i 



K(p, n, 5) = e 



_ p -(n+p)8 2 /2 



En 

k=0 j=0 



n + p + 2j \ J_ 
P + 2j ) X kl 



(n + p)S 2 



Proof of Lemma WM Suppose S — > oo and n = n(S) such that rii^ae [0, oo). 
Since (n + p + 2j)/(p + 2j) < n + p, then 



oo 1 

K{p,n,S)<Y J (n+p) k - 



k=0 



{n + p)5 2 



< (n+p) 2 5 2 /2 



(A2.1) 



and by dominated converge, 



fe-i 



lim i<T(p, n, 8) = lim V TT 



5— *oo ' 

fc=0 j=0 

oo A;— 1 



l + 2j/(n + p) 
p + 2j 



kl 



(n+p) 2 5 2 



fc=0 j=0 



1 /a 



p + 2j J kl V 2 



M p (a). 
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Next suppose 5 — > and nS — > oo. Then one gets 



K(p,n,6)>e-^ s /2 J2U 

oo k— 1 



fe— 1/ \ . / r-9\ fc 

1 / no z 



> e -(« + ^/ 2En i 



2j 



p + 2jj k\ 

k 



1 /n^ fc 



-(ra+p)<5 2 /2 



E 



(2fc)! V P 



1 fn 2 5 2 



k\ 



-(n+p)<5 2 /2 



*/\^ _i_ P ~ nS /Vp 



□ 



Because (n + p)5 2 = o(nS) and ri(5 — > oo, the right hand side tends to oo. The 
proof is thus complete. 

Proof of Lemma \3.3[ First, one gets 

oo k—1 r 



k=0 j=0 



l + 2j/(n+p) 
1 + 2j/p 



fe! 



{n+p) 2 5 2 



2p 



OO ^ 

< p -(«+p)5 2 /2 V — 



k=0 



(n+p) 2 5 2 
2~P 



n(n + p)S 



2 1 



2p 



Thus, by dominated convergence, K(p, n, 6) — > e a as n(n + p)S 2 / (2p) — > a. 

Now let a > 0. Regard /(n) = n(n + p)<5 2 /(2p) as a quadratic function of n. 
Then in order to get f(n) — > a, 



4pa 



-8 2 p + y/^V + 8<5 2 pa _ 
'(l/S)y/2pa if <5 2 p->0, 



2a/5 2 



4a/<5 2 



if 5 2 p — > oo, 
if S 2 p^ L> 0. 



1 + V1 + 8a/L 

The proof is thus complete. 

In order to prove Lemma l3.4[ we need the following result. 
Lemma A2.1. Given < e < 1, there is A(e) > 0, swc/i that 



□ 



.4 A ' 



^ fe! " 



\k-A\>eA 



as A — > oo . 
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Proof. Let Y be a Poisson random variable with mean A. Then 

A k 

\k-A\>eA 



E ^=P(\Y-A\>eA). 



By LDP (@), I := -(l/A)1nP(\Y - A\ > eA) > 0. Then given A(e) G (0,7), 
P(\Y - A\> eA) < e - x( - e)A for all A > 0, implying the stated bound. □ 

Proof of Lemma\3.4\ Fix S > and n. Then 



fc=oi=o j / 

Let < e < 1. For each fc, rp*=o [( n + P + 2 J)/(p + 2 i)] < (1 + n/p) k . Then 



i-f ^ + P + 2 A v A * <- -A V- [(l + n/p)A] fc 

2^ 11 I p + 2 j J IT - 2^ jfej 

|fc-A|>eAj=0 v 7 |fe_A|> £j 4 



Denote B = (1 + n/p)A. Then given any < (5 < e, for all p > 1, |fc — A\ > eA 
implies \k — B\ > 8B. By Lemma [A2.lt as P °°! 

e -A y: Ki±^M!< e -A ^ & < = 0(1)j 

|fc-A|>e.4 '' |fe-B|>eB 



where A (5) > is a constant. It follows that 



fc-i 



jf( P ,n > *) = e^ £ n 1 + ^7 x ir + °( 1 )- 

|fe-A|<e^ j=o v ^ J 7 
By ln(l + x) = x + 0(x 2 ) as x — » 0, it is seen that 

/\"(/>. />. (>i = = r * V CI -. /,, !,-x;» j - V - \. ^ ) x ^- + o(l), 



|fe 



£ (i+r fe )exp[^x: TT i- 7 -) 



where sup\ k _ A \ <eA \rk\ — > as p — > oo. It is not hard to see that for all p ^> 1 
and A; with \k - A\ < eA, \k/p - 6 2 /2\ < eS 2 . As a result, 

(1 + r k ) exp 2 g TT i- 7 - j = [1 + r^)] exp fn £ /3 _ ^ ^ 



= [l + ^(e)](l + ,5 2 )" /2 - 



+ 2a: 
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where sup^ k _j L ^ <e j L \r' k (e)\ — > as p — > oo followed by e — > 0. Combining the 
above approximations and applying Lemma again, 

K(p,n,S) = [l + R(e)](l + S 2 ) n/2 e- A ]T ^ + o(l) 

|fc-A|<eA 

= [l + i?(e)](l + 5 2 )"/ 2 + (l), 

where i?(e) — > as p — > oo followed by e — > 0. Let p — > oo. Since e is arbitrary, 
then K{p,n,5) -> (1 + (5 2 )™/ 2 . □ 



A3. General i tests 

A3. J. Proof of the main result 

This section is devoted to the proof of Proposition ^. II Write 

A*{u) = sup[u< - A(t)], **(w) = sup[u£ - *(*)], 

4 * (A3.1) 

r? A (u) = (A')- 1 ^), »?*(«) - (f)" 1 ^), 

whenever the functions are well defined. The lemma below collects some useful 
properties of A. The proof is standard and hence omitted for brevity. 

Lemma A3.1. Suppose condition a) in Theorem ^. 1\ is fulfilled. Then the fol- 
lowing statements on A are true. 

1) A is smooth onT>°^, strictly decreasing on (— oo,0)n2?A ; strictly increasing 
on (0, oo) R T>\. 

2) A 1 is strictly increasing on T>\, and so i]\ = (A') -1 for well defined on I\ = 
(inf A', sup A'), where the extrema are obtained over T>\. Moreover, A'(0) = 0, 
(A') _1 (0) = 0, and tA'(t) -> oo as t | supX> A - 

3) A* is smooth and strictly convex on I a, and 

(A*)'(u) = T)a(u) = argsup[ui - A(i)], u e Ia- 

t 

On the other hand, A* (it) = oo on (— oo,inf A') U (sup A', oo). 

The next lemma is key to the proof of Proposition 14. 11 Basically, it says that 
the analysis on the ratio of the extreme tail probabilities can be localized around 
a specific value determined by A and the index A in (|4.4|) . As a result, the limit 
(14. 71) can be obtained by the uniform exact large deviations principle (LDP) in 
(1), which is a refined version of the exact LDP (0). 
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Lemma A3. 2. Let m, n — > oo, such that n/N — > p S (0, 1), where N = m + n. 
Let vo = A'(to), where to > f/ie unique positive solution to (|4.6p . Under 
conditions a) and b) of Theorem \4-.l\ given D > and <5 > 0, t/iere are zo > 
and rj > 0, such that for z > zq, 



lim — inf InP (X n + s > zS m , \zS m - i/ | < S) > -J z {v ) (A3.2) 

JV^oo N \s\<D/N K 



and 



sup 

\s\<D/N 



P(X n + s> zS n 



P (X n + s > zS m , \zS m - v \ < 5) 



- 1 



= 0{e-^ N ), (A3.3) 



where J z (v ) = (1 - p)A*(i/ ) - P^*{^o/z 2 ) < oo. 

Assume Lemma lA3.2l is true for now. The main result is shown next. 

Proof of Proposition [7T7j Recall that aV — ► and TV — > oo, such that d^N 
T. First, we show that, given e > 0, there is z > 0, such that 



lim 



P (X n + d N > zS, 



,{l-p)Tt 



< e, 



all z > zo- 



(A3.4) 



P {X n > zS m ) 

Let 5 G (0, 1) such that rj\(u) is well defined on [v$ — 5, v + 5] and 

I / \ i \\ ^ ln(l + e) 

sup \r] A (u) - ry A (fo)l < -p. yf- 

\u-u \<6 U - P) 1 

Let zq > and 77 > such that (|A3.3[) holds. Fix z > zq. Denote a = a(z) — 
{vq —S)/z and b — b(z) = (vq + 5)/ ' z. Because of (IA3.3|i . in order to show (|A3.4|) . 
it suffices to establish 



lim 



P (X n + d N > zS m , a < S m < b) 



,{X-p)Tt 



< e. 



(A3.5) 



P (X n > zS m , a < S m < b) 
Let G m {x) be the distribution function of S m . Then 

P (X n + d N > zS m , a < S m < b) = / P(X n > zx - d N ) G m (dx), 

J a 

P (X n >zS m ,a< S n < b) = f P(X n > zx) dG m {x). 

From these equations, it is not hard to see that (|A3.5[) follows if we can show 

P(X n > zx — aV) 



lim sup 

N ~* °°x£[a.b\ 



P(X n > zx)e( 1 ~P) Tt " 



< e. 



(A3.6) 



To establish (|A3.6[) . observe that for N > 1 large enough and x € [a, b], 
zx — djq 6 [a/2, fo + o']. Therefore, tn(x) :— i]\(zx — aV) is not only well defined 
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but also continuous and strictly positive on [a, b]. By Theorem 3.3 of ([2), as 
N — ► oo, the following approximation holds, 



sup 

x£ [a, 6] 



,«A* (^x-djv) Tjy ^ j v / 27 rnA" (tat (z)) P(X„ > zx - d N ) - 1 = o(l), 



which is a uniform version of the exact LDP due to Bahadur and Rao (0, The- 
orem 3.7.4). 

Because tjv(x) — > t]a(zx) uniformly on [a, b] and the latter is strictly positive 
and continuous on [a, b], the above inequality yields 



sup 



z nA " izx - dN) r ]A (zx) v / 2imA"(Ti A (zx))P(X n > zx - d N ) - 1 = o(l) 



Likewise, 



sup 

xG 



3 ,lA * (z:r) r/ A (za;)v / 27rnA"(ry A (za;))P(X n > zi) - 1 = o(l) 



By the above approximations to P(X n > zx — djy) and P(X n > zs), in order 
to prove (|A3.6[) . it is enough to show 



M := lim sup 

W-»°°a;G[ ,&] 



-nA* (zx—cIn ) 



e -nA*(2x) + (l-p)Ti 

By Taylor expansion and Lemma I A3. 1[ 

A* (za; — <i/v) = A* (zx) — d^r/^zx — £d/v), 
where £ = f (a) £ (0, 1). Therefore, 



< e. 



a; G fa, 61 



-nA* (zx— cZjv) 



-n(A* (zx— ciiv)— A* (zx)) f3 —ndNrj\(zx—^dj^) 



e -nk*{zx)+(l-p)Tt e (l-p)T* e (l-p)Tt 

Since nd/v (1 — p)T and r\tv(zx — £g?at) — > i)a(m) uniformly for a; € [a, 6], 
Af= sup e d-rt('?A(^)-to)T _ j 

i£[o,i] 

Because to = Va^o) and zx G [i^o — <5, + 5] f° r a; G [a, 6], 



M < exp 



(1 - p)T sup |t?a(mo + w) - J?a(^o) 
«e[-5,<5] 



1 < e. 



Therefore (|A3.5|) is proved. 

Now that (|A3.4|) holds for any given e > 0, as long as z > z = Zo(e), with z 
being large enough, by the diagonal argument, we can choose z/v > in such as 
way that zpj — > oo slowly as N — > oo and 



lim 



-(l-p)Ttop( Xn + dN > ZNSm) 



P(X n > z N S m ) 
This finishes the proof of the theorem. 



- 1 



= 0. 



□ 
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The proof needs a few preliminary results. The first lemma collects some useful 
properties of $ . 

Lemma A3. 3. Let T>q, = {t : A(t) < oo}. Under condition b) in Theorem ^. 1\ 
the following statements on '5 are true. 

1) D (—oo,0]. W is smooth and strictly increasing on T>^,. Furthermore, 
$(<) — ► — oo as t — ► — oo. 

2) is strictly increasing on T>%, and so r/q/ = ( , f')~ 1 is weZZ defined on Iq, = 
(0,sup , I // ) ; where the supremum is obtained over T>^. In addition, inf V?' = 
and sup^' > ^'(0— ) = a 2 . Furthermore, 

lim u7/*(u) = 7- — , (A3. 7) 

u— >o+ 2 

where A is given in (|4.4[) . 

5j is smooth and strictly convex on and 

(\fr*)'(u) = ?7*(it) = argsup[wi - « G 

t 

Furthermore, \t* is strictly decreasing on (0,cr 2 ) iuif/1 1 4 r *(u) — * 00 as it } and 
—> as it t f 2 ; ond is nondecreasing for u > a . 

Proof. We only show \I/(t) — > — 00 as t — > — 00 and (|A3.7|) , which are properties 
specifically due to condition b) in Theorem [4Tj The proof of the rest of Lemma 
IA3.3I is standard. 

To get &(t) — > —00 as t — > 00, it suffices to show / °° e~ tu ^ 2 g(u) du — > as 
t — > 00. For later use, it will be shown that, given s > 0, 

x s e- tx2/2 g(x) dx^O, as i ^ 00. (A3.8) 

The proof is based on several truncations of the integral. Given < rj < 1, 
there is < e < 1, such that 

Since M e = sv^\ x \ >e g{x) < oo, given 5 > 0, as t — > 00, 

/ x s e- tx2 ' 2 g[x) dx < e~ u2 / 4 M e / ^e^ 4 dx = o{e~^^). 
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On the other hand, 

c s e~ tx2 / 2 g{x) dx>{l-rj) f x s+x e - tx2/2 ((l/x) dx 

Jo 

> (1 - r?)C(l/e) f x s+x e~ tx2/2 dx. 



The right hand side is of the same order as L°° x s+x e tx2 / 2 dx, which in turn is 
of the same order as f-( x + s + 1 )/ 2 . As a result, 



DC 

a;" 



s e- te2/2 ff (a;)da; = (1 + o(l)) / x s e - te2/2 <7(x) dx, as t oo. 



g(x)/[x x ((l/x)} - 1 G for a; G (0, e) and tj is arbitrary, it is seen 

that in order to prove (|A3.8[) . it suffices to show 

x s+x e~ tx2/2 C(l/x) dx->0, as t -> oo. (A3.9) 



Let a = e 2 /2 and <p(x) = ((y/x/2). By variable substitution x = y/2u/t, 

c s+x e- tx2 / 2 ((l/x)dx = 2 p t- ( - p+ V / u p e~ u <p{t/u)du, (A3. 10) 

Jo 

where p = (s + A - l)/2 > -1. Therefore, (|A"3~9|) will follow if 

r (p+i) / u P e -»^(t/ u ) <fa -f 0, ast-^oo, (A3.ll) 



Note that 0(x) is increasing and since u p e u is integrable, there is M > 1, 
such that J™ u p e~ u du < r\ J Q M u p e~ u du. Then 



ta />oo 

u p e~ u (j)(t/u) du < <f>(t/M) / M p e-" 

M JAf 

< ry / u p e- u (j)(t/u) du. 



(A3.12) 



Fix S G (0, 1) such that < - r ? P +1 r;). Then 

/ u p e~ u (t){t/u)du = V / u p e- u <j>(t/u)du 
Jo k =i Jsk+1 

oo -i 



^ I ^ ) du - 



Note that is slowly varying at oo. For t large enough, 4>[t/u) < r/(f>(t / (5u)) 
for u G [?7, 1]. By induction, <j>(t/(S k uj) < rj k ~ 1 <f)(t / u) , k > 1. Consequently, by 
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the selection of S and the above infinite sum, 



-$ oo -i 

/ vPe~ u 4>{t/u)du<y"5 {lp+1)k J 1 k - 1 u p <t>{t/u)du 
Jo k=l Js 



(A3. 13) 

u p (j>(t/u) du < r\ I u p e u (f>(t/u)du. 



1 - 5p +1 t] J s 

Now given < 5 < M < oo, as <j) is increasing and slowly varying at oo, 

inf = MM} ; 1 su = ; 1 

5<u<M <j){t) <j>{t) * ' 6<u<M 00) 0(0 

Therefore, 

M i-M 

u p e~ u (j){t/u)du = (1 + o(l))0(t) / u p e~ u du, as i -> oo. (A3. 14) 
Combine (|A3.12|) - (|A3.14|) and note 5 and M are arbitrary. Then 

ta poo 

u p e- u <t>{t/u)du={\ + o{l))<t>{t) u p e~ u du 

Ja (A3.15) 

= (1 +o(l))<j>(t)T(p+ 1), asi->oo. 

Note 0(t) = o(t p+1 ) as i ->■ oo. Therefore, (|A3.11|) is proved. 

Next we prove (|A3.7j) . For u > small enough, r]^{u) is well defined. Let 
£ = —rj-&(u). Then it = and t — > oo as it J. 0. Therefore, it suffice to 

demonstrate t) — ► (A + l)/2, as i — > oo. It is easy to see 



1 />oo / />oo 

^(-t) = iy x 2 e' tx2/2 g(x)dx / J e~ tx2/2 g(x)dx, 



for f > 0. 



Following the argument leading to (|A3.9[) , it suffices to show that, given 
A > 0, 

T a; A e -to a /2^(i/ a; ) da; = + f x 2+A e- te2 / 2 C(l/x) dx 

Jo A + 1 7 

as i — > oo. Denoting p = (A + l)/2, by (1A3.10|) . the above limit will follow if 

ta -i , /-i \ />ia 

u p - 1 e- u <f>(t/u)du= ' u p e- u ct)(t/u)du, t^oo. 



P 

However, this is implied by (|A3.15[) and T{p + 1) = pF(p). □ 

Lemma A3. 4. Given p S (0,1), Zei i^o = A'(£o), where to > is £/ie positive 
solution to (|4.6p . TTien under conditions a) and b) of Theorem \4-l\ for any 
5 G (0, fo), i/iere are zo > and a > 0, such that for z > zo; 

inf { (1 - p)A* («) + p** (u 2 /^ 2 ) } > (1 - P)A* M + P** KV^) + a. 

\U-U a \>d 
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Proof. The infimum on the left hand side increases as 5 decreases. Since vn < 
sup A', without loss of generality, let 8 < sup A' — Uq. For z > 0, write 

H z {u) = (1 — p)A*(u) + p^*(u 2 /z 2 ) 

Then by Lemma[MU for u E (0,a 2 z 2 ) n (0,supA'), 

H' z (u) = (1 - p) VA (u) + ^ m {u 2 /z 2 ) (A3.16) 

z A 

For any 77 G (0, v — 8) and M € (t'o + <S, sup A'), by (|A3.7|) , as z — > 00, 

uH' z {u) — > /i(u) := (1 — p)wq^(u) — p(A + 1), uniformly on [77, M]. 

Since ft, is strictly increasing on [0,oo), is the only positive solution to 
h(u) = 0. Therefore, there is an > 0, such that 

inf h(u) > ao, sup < — ao- 

u-i/ >5/2 u-vo<-«/2 

Let a = (a /2)min{l n5 ^±^,lnrf?}. As z - 00, fl» - fc(u)/u uni- 

formly on [77, M]. Since h(u) > for u 6 [z/q,M], and h(u)/u > ao/M for 
u£ [i/q + <5, M] , it can be seen that for all z > large enough and it € [^0 + 8, M] , 

#,(«) - h z { VQ ) = r H' Z ( S ) ds > \ r ^d S >^ r ~> a . 

Ju * Jv +6/2 s * Jv +5/2 s 

Likewise, for all z > large enough and u G [77, vn — 8], 

r° an r°- 5 / 2 ds 

H z {u) - H z {v ) = / [-H' z (s)} ds>^- —> a . 

Ju 2 J u s 

To finish the proof, it suffices to show that there are M S (fo,supA') and 
77 G (0, vq), such that for all z > large enough, H z (u) is strictly increasing on 
(M, 00) and strictly decreasing on (0,77). 

First, given z > large enough, by Lemma I A3. 31 H z (u) is increasing for 
u > zcr 2 and equal to 00 for it > sup A'. As a result, it is only necessary to 
consider u < M' := min(sup A', zcr 2 ). Note that if sup A' < 00, then for all 
z > large enough, M' = sup A'; whereas if sup A' = 00, M' = za 2 . 

Let (p(u) = (u 2 jz 2 )T]^{u 2 jz 2 ). For u e {y ,M'), > <p(u) > C := 
inf o<u<a 2 [ Uf l^( u )] > —00. By Lemma I A3. 11 there is i^o < M < sup A' such 
that (1 - p)Mt)a(M) > -2pC + 1. Then by (|A3.16|) and the fact that ur/ A (u) is 
strictly increasing for u S (0,supA'), H' z (u) > 1/u > for it £ (M, M'). Then 
77 z is strictly increasing on (M, M'). 

Second, as it J. 0, utia(u) — > and it 2 77*(u 2 ) — > — (A + l)/2 < 0. Therefore, by 
(|A3.16jl . there is 5 S (0, z/o), such that for all 2 > large enough and it G (0, 8), 
uH' z {u) < -p(X + l)/4. Then ff^(u) < for u < 5 and hence H z (u) is strictly 
decreasing. This finishes the proof. □ 
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Proof of Lemma \A3.B . Since the left hand side of (|A3.2j) is increasing in S, 
without loss of generality, assume S £ (0, Vq). Let z > a 2 /{vq + 5). Given 
z > z a and e £ (0, 5), for N > D/e and s £ [-D/N, D/N] C (-e, e), 

P [X n + s > zS m , \zS m -vq\<6) 

> P (X n + s > zS m , \zS m - i/ 1 < e) 

> P {X n > i/ + 2e, |zS m - i/ | < e) 

= P (X„ > i/ + 2e) P (i/ - e < zS m < vq + e) . 

Observe that for < a < b, a < zS m < b is equivalent to mo? j z 2 < 
E™=i(>2fc-i - Y 2k ) 2 /2 < mb 2 /z 2 . Also, A*(t) is increasing on (0,oo), 
is decreasing on (0, a 2 ), and (vq + e) 2 / z 2 < a 2 . Therefore by LDP, 

lim — inf P (X n +s> zS m , \zS m - vq\ < 5) 

N^oo N \s\<D/N v 

> lim ■ilnP(X„>i/ +2e)+ lim -j- InP {\zS m - i/ | < e) 

N-^oo iv iv— »oo iv 

= (1 - p)A*(i/ + 2c) + p** ((i/o + e) 2 /^ 2 )- 

Because e is arbitrary and A* and \&* are continuous, (|A3.2|) is proved. 

Consider (|A3.3p now. By Lemma fA3. 41 there is r\ > 0, such that for all z > z$ 
and u £ [0, i/ Q - 5/2] U [i/ + 5/2, oo), 



Let 



(1 - p)A* (u) + pV* (u 2 /z 2 ) > J z (yo) + 2 V . (A3.17) 



R- = sup P (X n + s > zS m , zS m <vo-S), 

\s\<D/N 

R + = sup P (X n + s > zS m , zS m > Vq + S) 

\s\<D/N 



Since the left hand side of (IA3.3[) is no greater than 

P_ +R+ 



inf| s |<D/jv P ( x n + s > zS m , \zS m - i/ 1 < 5) ' 
by (|A3.2[) . in order to establish (|A3.3[) , it suffices to show that for z > zq, 

For any < u < vq — 5, by (|A3.17[) , there is r = r(u) £ (0, u/3), such that 

(1 - p)A*(u - 2r) + p**((u + r) 2 /z 2 ) > J> ) + V . 

By **(u) t co as ti | 0, there is r € (0,i/ - 5), such that p**(rg/z 2 ) > 
"^(^o) + V- Because / = [0, Uq — S] is compact, one can choose uq — and 
Ui, ■ ■ ■ , Up £ I, such that / C U" =0 [ui — rj, Uj + rj, with rj = r(iti) for i > 1. 
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It can be seen that, for N > Dj min(e, 7*0,7*1, ... , r p ), R- < X)i=o ^ii wnere 
A a = P(zS m < r ) and A { = P(X n > u, - 2r iy \zS m - m\ < n), i > 1. For 
the latter ones, by the choice of z and 7^, Ui — 2r^ > and (ui + r^)/z < a 2 . 
Therefore, by the LDP, 

lim — In— = (1- p)A*(u l -2r i ) + pV* (( Ui + n f / z 2 ) > J z (v )+r). 

N^ca iV Ai 

Similarly, lim(l/iV) ln(l/^4o) > Jziy) + ?/. Since there is only a finite number of 
At, hm(l/JV)ln(l/-R-l > J z {v ) + n. Likewise, ]im(l/N) m(l/i?+) > J z {v Q ) + n. 
The proof is thus complete. □ 



A4. Tests involving likelihood 
A4-1- Proof of the main result 

This section is devoted to the proof of Proposition 15.11 The proof is based on 
several lemmas. Henceforth, let N — m + n and v$ — A'(io), where to is the 
positive solution to (|4.6p . It will be assumed that as m — > 00 and n — > 00, 
m/N — ► p 6 (0, 1), where p is fixed. 

Lemma A4.1. Let 5 G (0, z^ /2) and e > 0. There are z > cmd 6> = 6*0(2:), 
smc/i i/iai given z > zq, as m — > 00 cmd n — ► 00, 



sup 



<#<(?() 



e<e Pe(X n > zS m , \zS m - v \ < 5) 
Pe(X n > (1 + e)zS m , \zS m -v \<8) 



0. 



0. 



(A4.1) 
(A4.2) 



Lemma A4.2. Let a\ > 0. Under the conditions of Theorem I5.il for any 

e > 0, £/iere are mo > and S > 0, such that 



sup 

0<\t\<5 



E (e aUm I S„ 



t) 



<ee aKf , m>m , < a < ax. 



where Eq is expectation under Pq, Kj is defined as in Theorem I5.il and U m = 
(1/m) i ^ with Ui = (y M -i + F 24 )/2. 

Proof of Provosition \5. 1\ We shall show that for any 6 > 0, there is z$ = Zo(b), 
such that for all z > zq, 



lim 



(A4.3) 



where L = exp{(l — p)TA'(io) + 2pTKf}, and the limit is taken as m — > oo, 
n -> oo and 9 N -> 0, such that 6V-/V -> T > and m/N -> p e (0, 1). This 
together with a diagonal argument then finishes the proof. 
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Let e > and 8 <E (0, u /2), such that Lemma lA4~2l holds with a = 2pT. Fix 
z o > { v o + £)/S as in Lemma fA4. II Then, given z > zq, in order to show (|A4.3|) , 
it is enough to show 



lim 



Pe N (£r? 



- L 



<b, 



where f „ 



{X n G [zS m , (1 + e)zS m ], \zS m - < 8). For 9 G [0, 1], 



(A4.4) 



a J„(«)+raZ m (fi) 



where 



J n (0) = ^lnre^i), Z ro (0) = - ^ [lnr^u;^) + Inr fl (w^)] 



i=i 



Since lnrg(o;i) = £g(oJi) — £o(cOi) and £g(u)i) — Xj, by Taylor's expansion, 
J n (0) = n6X n + — ^4 e (wi), for some s G (0, 1). 

8=1 

Let £> = sup e \\£g(uj)\\ Lx ,(p n y By Condition 3, £> < oo. Since 9pfN — > T, 
nfljv -f (1 - ( o)T and \J n (9 N ) - nd N X n \ < nB6%/2 = 0(1/N). On £ m<n , 

\X n - v a \ < \X n - zS m \ + \zS rn - v a \ < ez ^m + * < e i : = e (^o + <5) + <5. 

It follows that for m and n large enough, 

|Jn(M - (1 - p)Tv \ < e + \n9 N - (1 - p)T|X„ + (1 - p)T\X n - u \ 
< e 2 := e + e(^ + £i) + (1 - /»)Tei. 



Denote Q N = E [e mZmf - e ^ \ E m , n }. We obtain 

e (l-p)Tu -e 2 Q N < -Pfljy( g m,n) < e (l- ( 9)T I y + e 2 Q 
-Po(^m,n) 



(A4.5) 



Let _4 m = {IzSVn — < e}. Since and are independent, then 
Q N = E [e mZ ™^\A m \. 
Let /7 m be defined as in Lemma fA4. 21 By Taylor's expansion, 

rn „2 m 

mZ m (9) = 6^2(^-1 + Y 2i ) + — Yyiso{u2i-x) + LeM] 

i=l i=l 

o 2 m 

= 2mdU m + — y^\t s e{u 2 i-i) + Le{u2i)], some s G (0, 1). 
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Then for m large enough, \mZ„ 



' N ) 



2m9 N U m \ < BT 2 /m < e, yielding e~ e < 



Q N /E {e 2m8 » u ™ | An) < e e . On An, S m < (p + e)/z < 6, so by LemmaEU 
1 - e < E Q (e 2m9NUm | Am)/e 2meNKf < 1 + £■ By combining ([X45]) . we thus 
obtain 

^ _ £ ^ e -e-e2+2(me lv -pT)2 7 < -Pfljy (^m,ra) < (1 _|_ e ^ e e+e 2 +2(m0 N -pT) 



-fo(^m,n) 

Because e and 62 are arbitrary and iti9n — > pT, (|A4.3[> is proved. 



□ 



A4-2. Proof of Lemmas 

We need the next result to show Lemma [A4. 11 

Lemma A4.3. Given a £ (0, 1) and e > 0, there is 9q > 0, such that 



sup P e {£) < P (£)^ a e k \ inf P e {£) > P (£) 1/{1 - a) e 



ki 



(A4.6) 



for all k > 1 Zarpe enough and £ (Z fl k . Furthermore, let £^ C Q k be events such 
that lim(l//e) lnPo(^fc) > oo- Then 



lim lim — sup 

e ^Q fc^oo k o<0<e o 



In 



Pe(£k) 



Po(£k) 



= 0. 



Proof. Givena € (0,1), let = 0'(a) as in Condition 4. Denote u; = 
For each 9 g [0, 6'], k > 1, and £ C Sl fc , by Holder's inequality, 

P g {£) = E [1 G £} r fl (wi) . . . r 9 (w*)] 

< [Pol {u; G {p [r e K) Va • ■ ■ r e (uJk) 1/a ] }° 

= P {£f- a [E [re^) 1/a \} 
Therefore, given 9 € (0,9'), 



ka 



sup P {£) < P (£) 1 " a exp<^ fcalnPo 
e<e I 



sup rg(ui) 1/a 

8<8 



Likewise, letting q — 1/a — l, 
Po(£) < Pe^) 1 - 11 [Eb [r e (uj)- 1/a ] = P e {£ {E [r e («)-«] } k ° . 



Since q > 0, the above bound yields 
inf P e (£) > P (f) 1/(1 ^ a) exp 



e<e 



ka 
l-o 



InPn 



( inf r e (u))- q 

8<8„ 
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Under Pq, for almost every u> G f2, Po(i^) > and pg(uj) is continuous in 
9. Let 9o — > 0. Then sup 6)<eo rg(uj) — > 1 and infg<6( ^(w) — > 1. By (|5.2p and 
dominated convergence, 



In So 



sup rgiuj) 1 /"- 



0, \nE Q 



( inf r e (cj))- q 

8<0 O 



0. 



This implies that for 9 small enough, both of the inequalities in (|A4.6|) hold. 
To show the second part of the lemma, for each n > 1, 



\ In Pg (£ k ) < ^ In P (£* ) + a In P fa (w) 1/a ] , 



which yields 



— sup In 



k o<e<e Po(£k) 



<ai-~lnP {£ k )+lnE 



sup rg(u) x / a 

.8<0o 



Let k — > oo and take lim on both ends. By the assumption, 



lim ~ sup l-a ^f^ <a{M + \nE a 

k~+cc k <g<g Pf)\£k) 



sup r 6 /(w) 1 / a 
ls<e 



where M = -limfl/fc) lnP (£fc) > 0. Likewise, with q = 1/a— 1 > 0, 



lim I inf ln5£>>- ° 



fc^oo fc o<e<e P (£ ) 1 - a 
Thus we get 



lim lim sup — 

e ^Q fc^oo <e<e k 



hi 



Pe(£k) 



Po(£k) 

Because a is arbitrary, the lemma is proved. 



< 



( inf rg(uj))~ 

8<8 



aM 



1 -a 



□ 



It is easy to check that under the assumptions of Theorem l5.H all the state- 
ments in Lemmas IA3.ll and IA3.3I hold for A and defined in (|5.6[) . with 
X = Iq(uj), Y = £o(ui'). Therefore, Lemma [A3 . 2 1 can be applied. 

Proof of Lemma \A4.1\ We first show (|A4.1|) . By Lemma lAIOl there is z > 0, 
such that for z > zq and 9 £ (0, ^o/2), there is rj > 0, such that 



Po(£n,m H A% 
Po\£n,m ^ An 



o( e -" M ), 



(A4.7) 



where M = n + 2m, £ n . m = {X n > zS m } and A m — {\zS m — fo| < S}. Given 
e G (0, 1), by Lemma TA4.31 there is 9q > 0, such that for 9 6 [0, #o] and to, n 
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large enough, P g (£) < P {£) l ^e tM and P g {£) > P (f) 1/(1_£) e" eM for £ C n M . 
Since both £ n:Tn and An are events in ft M , then 

i . P e (£ n , m nA^) 

Ln.m ■■= ~T7 SUp In — — — — 

M o<e<e "e{£-n,m 1 ' ^Mn) 

l P (£ n>m nA c m y-* 
-m Po^nAn) 1 /^ 



1 



(1 c)ln P o( g ^n^ n ) | e(2-e) h 



2e. 



nA») i-e p (£ n A re ). 

By equations (|A3.2[) and (|A4.7|1 . there is a finite constant C > 0, such that 
limL„, m < -(1 - e)r/ + —rz~~ + 2e ' 

Since e is arbitrary, limL„ !m < 0. This then finishes the proof of (|A4.1|) . 

It remains to show (|A4.2j) . First, by the LDP for X n under Po and an ar- 
gument similar to the proof of (|A4.1jl . it can be seen that given r > and 
< a < b < suppo A'), there is 9 > 0, such that 

sup [P g (X n > b)/P e {X n G [a, a + r})] -> 0, as n -> oo. (A4.8) 
o<e<e 

Now let a G (0, e) and 77 G (0, (tf/Vo) A (a/2)), so that (1 + e)(l - rj) > 1 + a. 
Denote £ m = {\z S m - v \ < 777/0} and An = {|^<5'm - v \ < 5}. Then £ m C An- 
By Lemma [A3. 21 given z> 1, there is 80 > 0, such that 

inf Pe ^ n ~ zSm ' £m \ -> 1. (A4.9) 

e<9o P e (AT„ > zS m , An) 

For < #0, by the independence of X n and S m under Pg, 

Pe{X n > (1 + e)zS m , £ m ) < Pg(X n > (1 + e)(l - r?K £ m ) 

< Pe(^„ > (l + a)v Q )P e (e m ). 

By 77 < a/2, let e' G (0,e), such that (1 + e')(l + 77) < 1 + a/2. Let I 
' 1 — 77)7/0, (1 + e')(l + 77)1^0] ■ It is not hard to find a finite number of nonempty 
Ci) G /, such that for any x £ I, [x, (1 + e') x ] contains at least one Cj). 
Then 

Pe{X n G [zS m , (1 + e)zS m ], £ m ) > Pe(X n G [zS rn , (1 + e')2<S m ], £ m ) 

> minP e (X n G [6j, cj) Pe(^m) 

Since Cj < (1 + a/2)vo, by the above inequalities and (IA4.8[) . 

Pe{X n > (1 + e)zS m , £ m ) P e (X„ > (1 + e)zS m , £ m ) 

sup —= — — < sup 



8<8 Pe(X n > zS m , £m) 0<E0 O Pe{X n > [zS m , (1 + e)zS m ], £ m ) 

Pe(X n > (1 + flH) 
< max sup —= - ► 0, 

1 eee Pe{X n G [o,,CjJ) 
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yielding 



• nf Pe{X n € [zS m , (1 + e)zS m ], £ m ) 
e<s P {X n > zS m , £ m ) 



> inf 



P g (X n e [zS m , (1 + e) 



9<e Pg(X n > zS mi £ m ) 

which, together with (|A4.9jl . implies 

. - Pe{X n E [zS m , (1 + e)zS m ], A m ) 

0<8o Pe{X n > zS m , A m ) 

> . . Pe{X n € [zS m , (1 + e)zS m ], A m ) Pe{X n > zS mi £ m ) 

~ e<e Pg(X n > zS m , £ m ) s<e P e (X n > zS m , A m ) 

and hence (|A4.2[1 . 

Proof of Lemma \A4-S\ Let eo = e info< Q < a i e aKf . We have to show that for 
S > small enough and mo > large enough, 



□ 



sup 

0<|t|<<5 



E (e aUm \ S m =t)- e aK f < e , m > mo, < a < ai. (A4.10) 



Let Vi = (Y 2 i-i — Y 2 i)/2. Under P , ([/,, V$) are iid with density 
P(U € du,V € dv) 

— = 2f(u + v)f(u - v). 

du dv 

Denote £ = (v\, . . . , v m ) and 

Mz) = Eo(e zU \V = v). 

Then 

„ m 

E {e atJ ™ | S m = t) = / JJ^(a/m)l {«< ^ 0} P (dC | 5 ro = t). (A4.ll) 

i= 1 



Case i: / is bounded In this case, g(v) = J f(u + v)f(u — v) du is well 
defined for all v S sppt(V), h v (u) — f[u + v)f(u — v)/g(v) is the conditional 
density of U given V — v and <f> v (z) = J e zu h v (u) du. Since / is continuous 
almost everywhere and bounded, by condition a) of Theorem l5.1l there is r > 
such that sup^ J e r '"'/(w + v)f(u — v) du < oo, and by dominated convergence, 
as v — > 0, g(v) — > g(0) = J / 2 € (0, oo). It follows that there is c > 0, such that 
{(^u(z), i> G [— e, c]} is a family of smooth functions of z 6 [— r, r] with uniformly 
continuous and bounded </4(z) and <f>"(z). 
Given rj > 0, decrease c if necessary so that 

sup \0i k \z)~4 k) (z)\ < -2-, fc = l,2, 



Z. Chi/ 'Sample size and pFDR 



117 



where I = [— c, c] x [— r, r\. By Taylor's expansion, 

4> v (z) ~ M*) = Wv(Q) - ^o(o)]^ + \WWz) - <t>o(0z)]z 2 , (v, z) e I, 

where 8 — 9{v, z) £ (0, 1). Then there is too > 0, such that for all to > too, 
a £ [0, Oi] and v £ I, one gets a/m £ [—r,r], 

\4> v {a/m) — cj)o(a/m)\ < 2rj/(3m) < (rj/m) inf <fio(a/m) 

0<a<ai 

and hence 

1-^<#^<1 + ^, aUa e [0,aJ. (A4.12) 
to cpo(a/mj to 

Given (5 G (0, c), for < f < 5, rewrite (|A4.11[) as 

E (e a0 ™ \S m = t)= J JJ <f> Vi (o/m) J] ^ (o/m) P(d£ I Sm - t), 

where J = {i : \vi\ > c}. Let s > and i > be as in (|5.8p . For to large enough, 
a/m < s, a £ [0, ai]. Therefore, by Holder's inequality, for i £ J, 

<K(a/m) < [<M S )f /(sm) < i a/(sm) exp f^M) 

\ sm J 

<p Vi (a/m) > - 1 > exp f -^W) . 

tp Vi {—a/m) \ sm J 

Let p = | J\/m. By the above first set of inequalities and Schwartz inequality, 

A - 1 \ 5 TO I V 5 TO V t-^ 

Likewise, by the above second set of inequalities and Schwartz inequality, 

aL 



n k («/™) > L_ap/s ex p — v j e 



Since {5 m = t} = {(1/to) £ ^ 2 = t 2 /4}, 

L-'/'exp f-^) < H^a/m) < W«p ( ^ 



Observe that, due to < t < 5, S m = t implies p < 5 2 /c 2 . Therefore, as long 
as 5 is small enough, ap/s is arbitrarily close to 0, and aLt^fp/ (2s) is uniformly 
arbitrarily close to for < a < a\ and < t < S. Consequently, for each 

C e {s m = *}, e-" < n <e jM°/ m ) < e "- 
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On the other hand, by (|A4.12j) . 



e 



V to/ 0o(a/m) V m/ 



Thus, e- 2 "<^o(o/m) m ( 1 - p ) < E (e a0 ™ | 5 m = t) < e 2, ^o(a/m) m(1 ~ p) for all 
i € [— 6, 5} \ {0}. Since r\ and p are arbitrarily small and ^(a/m) 171 — > e aK/ 
uniformly for a 6 [0, ai] as m — > oo, (|A4.10[) then follows. 

Case ii: / is symmetric and has a bounded support In this case B := 
||[/|| L oo(p ) < oo. By condition c) of Theorem 1 5. 1[ the density of V is continuous 
and bounded on (e, oo) for any e. Then 4> v (z) is well defined for all z and v G 
sppt(V) \ {0}. Since / is symmetric, for v G sppt(V) \ {0}, 



<f/ v (0) = / «/(« + v)f(u - v) du/g(v) = 0, 
and so \4> v (a/m) -l\< \(/>%(9a/m)\ (a/m) 2 , with 6 G (0, 1). By 

= / u 2 e su f(u + v)f(u - v) du/g(v) < B 2 e^ B 



Then \<j> v (a/m) - 1| < {a/m) 2 B ll where B x = B 2 e("/"'' B . Then by (IA4.11|) . 
[l-Bi(a/m) 2 ] m < E (e a0 ™ \S m = t)< [l+B 1 (a/m) 2 } m , which implies (IA4.10I) . 

□ 



